Computing and Using Lower and Upper Bounds for Action Elimination in MDP Planning

نویسندگان

  • Ugur Kuter
  • Jiaqiao Hu
چکیده

We describe a way to improve the performance of MDP planners by modifying them to use lower and upper bounds to eliminate non-optimal actions during their search. First, we discuss a particular state-abstraction formulation of MDP planning problems and how to use that formulation to compute bounds on the Q-functions of those planning problems. Then, we describe how to incorporate those bounds into a large class of MDP planning algorithms to control their search during planning. We provide theorems establishing the correctness of this technique and an experimental evaluation to demonstrate its effectiveness. We incorporated our ideas into two MDP planners: the Real Time Dynamic Programming (RTDP) algorithm [1] and the Adaptive Multistage (AMS) sampling algorithm [2], taken respectively from automated planning and operations research communities. Our experiments on an Unmanned Aerial Vehicles (UAVs) path planning problem demonstrate that our action-elimination technique provides significant speed-ups in the performance of both RTDP and AMS.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Strong Worst-case Upper Bounds for MDP Planning

The Markov Decision Problem (MDP) plays a central role in AI as an abstraction of sequential decision making. We contribute to the theoretical analysis of MDP planning, which is the problem of computing an optimal policy for a given MDP. Specifically, we furnish improved strong worstcase upper bounds on the running time of MDP planning. Strong bounds are those that depend only on the number of ...

متن کامل

Estimating ‎U‎pper and Lower Bounds For Industry Efficiency With Unknown ‎Technology‎

With a brief review of the studies on the industry in Data Envelopment Analysis (DEA) framework, the present paper proposes inner and outer technologies when only some basic information is available about the technology. Furthermore, applying Linear Programming techniques, it also determines lower and upper bounds for directional distance function (DDF) measure, overall and allocative efficienc...

متن کامل

Strong exponent bounds for the local Rankin-Selberg convolution

Let $F$ be a non-Archimedean locally compact field‎. ‎Let $sigma$ and $tau$ be finite-dimensional representations of the Weil-Deligne group of $F$‎. ‎We give strong upper and lower bounds for the Artin and Swan exponents of $sigmaotimestau$ in terms of those of $sigma$ and $tau$‎. ‎We give a different lower bound in terms of $sigmaotimeschecksigma$ and $tauotimeschecktau$‎. ‎Using the Langlands...

متن کامل

Incremental methods for computing Markov decision

Partially observable M arkov decision processes (POMDPS) allow one to model complex dynamic decision or control problems that include both action outcome uncertainty and imperfect observability. The control problem is formulated as a dynamic optimization problem with a value function combining costs or rewards from multiple steps. In this paper we propose, analyse and test various incremental m...

متن کامل

Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes

Partially observable Markov decision processes (POMDPs) allow one to model complex dynamic decision or control problems that include both action outcome uncertainty and imperfect observability. The control problem is formulated as a dynamic optimization problem with a value function combining costs or rewards from multiple steps. In this paper we propose, analyse and test various incremental me...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007